AIDArabic A Named-Entity Disambiguation Framework for Arabic Text

نویسندگان

  • Mohamed Amir Yosef
  • Marc Spaniol
  • Gerhard Weikum
چکیده

There has been recently a great progress in the field of automatically generated knowledge bases and corresponding disambiguation systems that are capable of mapping text mentions onto canonical entities. Efforts like the before mentioned have enabled researchers and analysts from various disciplines to semantically “understand” contents. However, most of the approaches have been specifically designed for the English language and in particular support for Arabic is still in its infancy. Since the amount of Arabic Web contents (e.g. in social media) has been increasing dramatically over the last years, we see a great potential for endeavors that support an entity-level analytics of these data. To this end, we have developed a framework called AIDArabic that extends the existing AIDA system by additional components that allow the disambiguation of Arabic texts based on an automatically generated knowledge base distilled from Wikipedia. Even further, we overcome the still existing sparsity of the Arabic Wikipedia by exploiting the interwiki links between Arabic and English contents in Wikipedia, thus, enriching the entity catalog as well as disambiguation context.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AIDArabic+ Named Entity Disambiguation for Arabic Text

Named Entity Disambiguation (NED) is the problem of mapping mentions of ambiguous names in a natural language text onto canonical entities such as people or places, registered in a knowledge base. Recent advances in this field enable semantically understanding content in different types of text. While the problem had been extensively studied for the English text, the support for other languages...

متن کامل

U-AIDA: a customizable system for named entity recognition, classification, and disambiguation

Recognizing and disambiguating entities such as people, organizations, events or places in natural language text are essential steps for many linguistic tasks such as information extraction and text categorization. A variety of named entity disambiguation methods have been proposed, but most of them focus on Wikipedia as a sole knowledge resource. This focus does not fit all application scenari...

متن کامل

Collective approaches to named entity disambiguation

Internet content has become one of the most important resources of information. Much of this information is in the form of natural language text and one of the important components of natural language text is named entities. So automatic recognition and classification of named entities has attracted researchers for many years. Named entities are mentioned in different textual forms in different...

متن کامل

AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data

Over the last decades, several billion Web pages have been made available on the Web. The ongoing transition from the current Web of unstructured data to the Web of Data yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework) from these websites. One of the key steps towards extracting RDF from text is the disambiguation of nam...

متن کامل

AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables

We present AIDA, a framework and online tool for entity detection and disambiguation. Given a natural-language text or a Web table, we map mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base like DBpedia, Freebase, or YAGO. AIDA is a robust framework centred around collective disambiguation exploiting the prominence of entities, similarity b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014